Hierarchical Bayesian Language Modelling for the Linguistically Informed
نویسنده
چکیده
In this work I address the challenge of augmenting n-gram language models according to prior linguistic intuitions. I argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In an empirical evaluation, the model outperforms the Kneser-Ney model in terms of perplexity, and achieves preliminary improvements in English-German translation.
منابع مشابه
Patient Safety and Healthcare Quality: The Case for Language Access
This paper aims to provide a description of the need for Culturally and Linguistically Appropriate Services (CLAS) for Limited English Proficient (LEP) patients, an identification of how the lack of CLAS for LEP patients can compromise patient safety and healthcare quality, and discuss barriers to the provision of CLAS.
متن کاملExtending Phrase-Based Decoding with a Dependency-Based Reordering Model
Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based and synta...
متن کاملLAMP - TR - 152 CS - TR - 4947 UMIACS - TR - 2009 - 15 November 2009 Extending Phrase - Based Decoding with a Dependency - Based Reordering Model
Phrase-based decoding is conceptually simple and straightforward to implement, at the cost of drastically oversimplified reordering models. Syntactically aware models make it possible to capture linguistically relevant relationships in order to improve word order, but they can be more complex to implement and optimise. In this paper, we explore a new middle ground between phrase-based and synta...
متن کاملBayesian Hierarchical Modelling for Tailoring Metric Thresholds
Software is highly contextual. While there are cross-cutting ‘global’ lessons, individual software projects exhibit many ‘local’ properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this pro...
متن کاملA Model for Tax Evasion Forcasting based on ID3 Algorithm and Bayesian Network
Nowadays, knowledge is a valuable and strategic source as well as an asset for evaluation and forecasting. Presenting these strategies in discovering corporate tax evasion has become an important topic today and various solutions have been proposed. In the past, various approaches to identify tax evasion and the like have been presented, but these methods have not been very accurate and the ove...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012